AITopics | polyphone disambiguation

Collaborating Authors

polyphone disambiguation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Dict-TTS: LearningtoPronouncewithPrior DictionaryKnowledgeforText-to-Speech

Neural Information Processing SystemsFeb-8-2026, 20:42:48 GMT

Polyphone disambiguation aims to capture accurate pronunciation knowledge fromnaturaltextsequences forreliable Text-to-speech (TTS)systems.

artificial intelligence, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

Europe > Czechia > South Moravian Region > Brno (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
North America > United States > California > Santa Clara County > Sunnyvale (0.04)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.37)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.36)

Add feedback

4e9d8aeeab6120c3c83ccf95d4c211d3-Paper-Conference.pdf

Neural Information Processing SystemsAug-14-2025, 18:05:17 GMT

dict-tts, pronunciation, tts system, (13 more...)

Neural Information Processing Systems

Country:

Europe > Czechia > South Moravian Region > Brno (0.04)
Asia > China (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Disambiguation of Chinese Polyphones in an End-to-End Framework with Semantic Features Extracted by Pre-trained BERT

Dai, Dongyang, Wu, Zhiyong, Kang, Shiyin, Wu, Xixin, Jia, Jia, Su, Dan, Yu, Dong, Meng, Helen

arXiv.org Artificial IntelligenceJan-2-2025

Grapheme-to-phoneme (G2P) conversion serves as an essential component in Chinese Mandarin text-to-speech (TTS) system, where polyphone disambiguation is the core issue. In this paper, we propose an end-to-end framework to predict the pronunciation of a polyphonic character, which accepts sentence containing polyphonic character as input in the form of Chinese character sequence without the necessity of any preprocessing. The proposed method consists of a pre-trained bidirectional encoder representations from Transformers (BERT) model and a neural network (NN) based classifier. The pre-trained BERT model extracts semantic features from a raw Chinese character sequence and the NN based classifier predicts the polyphonic character's pronunciation according to BERT output. In out experiments, we implemented three classifiers, a fully-connected network based classifier, a long short-term memory (LSTM) network based classifier and a Transformer block based classifier. The experimental results compared with the baseline approach based on LSTM demonstrate that, the pre-trained model extracts effective semantic features, which greatly enhances the performance of polyphone disambiguation. In addition, we also explored the impact of contextual information on polyphone disambiguation.

classifier, polyphone disambiguation, polyphonic character, (14 more...)

arXiv.org Artificial Intelligence

2501.01102

Country:

Asia > China > Guangdong Province > Shenzhen (0.05)
Asia > China > Hong Kong (0.05)
Asia > China > Beijing > Beijing (0.05)

Genre: Research Report (0.83)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

External Knowledge Augmented Polyphone Disambiguation Using Large Language Model

Li, Chen

arXiv.org Artificial IntelligenceDec-19-2023

One of the key issues in Mandarin Chinese text-to-speech (TTS) systems is polyphone disambiguation when doing grapheme-to-phoneme (G2P) conversion. In this paper, we introduce a novel method to solve the problem as a generation task. Following the trending research of large language models (LLM) and prompt learning, the proposed method consists of three modules. Retrieval module incorporates external knowledge which is a multi-level semantic dictionary of Chinese polyphonic characters to format the sentence into a prompt. Generation module adopts the decoder-only Transformer architecture to induce the target text. Postprocess module corrects the generated text into a valid result if needed. Experimental results show that our method outperforms the existing methods on a public dataset called CPP. We also empirically study the impacts of different templates of the prompt, different sizes of training data, and whether to incorporate external knowledge.

disambiguation, polyphone disambiguation, polyphonic character, (11 more...)

arXiv.org Artificial Intelligence

2312.1192

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Dict-TTS: Learning to Pronounce with Prior Dictionary Knowledge for Text-to-Speech

Jiang, Ziyue, Su, Zhe, Zhao, Zhou, Yang, Qian, Ren, Yi, Liu, Jinglin, Ye, Zhenhui

arXiv.org Artificial IntelligenceOct-19-2023

Polyphone disambiguation aims to capture accurate pronunciation knowledge from natural text sequences for reliable Text-to-speech (TTS) systems. However, previous approaches require substantial annotated training data and additional efforts from language experts, making it difficult to extend high-quality neural TTS systems to out-of-domain daily conversations and countless languages worldwide. This paper tackles the polyphone disambiguation problem from a concise and novel perspective: we propose Dict-TTS, a semantic-aware generative text-to-speech model with an online website dictionary (the existing prior information in the natural language). Specifically, we design a semantics-to-pronunciation attention (S2PA) module to match the semantic patterns between the input text sequence and the prior semantics in the dictionary and obtain the corresponding pronunciations; The S2PA module can be easily trained with the end-to-end TTS model without any annotated phoneme labels. Experimental results in three languages show that our model outperforms several strong baseline models in terms of pronunciation accuracy and improves the prosody modeling of TTS systems. Further extensive analyses demonstrate that each design in Dict-TTS is effective.

dict-tts, polyphone disambiguation, pronunciation, (13 more...)

arXiv.org Artificial Intelligence

2206.02147

Country:

Europe > Czechia > South Moravian Region > Brno (0.04)
Asia > China (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
(5 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.92)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Multilingual context-based pronunciation learning for Text-to-Speech

Comini, Giulia, Ribeiro, Manuel Sam, Yang, Fan, Shim, Heereen, Lorenzo-Trueba, Jaime

arXiv.org Artificial IntelligenceJul-31-2023

Phonetic information and linguistic knowledge are an essential component of a Text-to-speech (TTS) front-end. Given a language, a lexicon can be collected offline and Grapheme-to-Phoneme (G2P) relationships are usually modeled in order to predict the pronunciation for out-of-vocabulary (OOV) words. Additionally, post-lexical phonology, often defined in the form of rule-based systems, is used to correct pronunciation within or between words. In this work we showcase a multilingual unified front-end system that addresses any pronunciation related task, typically handled by separate modules. We evaluate the proposed model on G2P conversion and other language-specific challenges, such as homograph and polyphones disambiguation, post-lexical rules and implicit diacritization. We find that the multilingual model is competitive across languages and tasks, however, some trade-offs exists when compared to equivalent monolingual solutions.

machine learning, natural language, pronunciation, (21 more...)

arXiv.org Artificial Intelligence

2307.16709

Country: Asia (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.72)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Back-Translation-Style Data Augmentation for Mandarin Chinese Polyphone Disambiguation

Qiang, Chunyu, Yang, Peng, Che, Hao, Xiao, Jinba, Wang, Xiaorui, Wang, Zhongyuan

arXiv.org Artificial IntelligenceNov-17-2022

Conversion of Chinese Grapheme-to-Phoneme (G2P) plays an important role in Mandarin Chinese Text-To-Speech (TTS) systems, where one of the biggest challenges is the task of polyphone disambiguation. Most of the previous polyphone disambiguation models are trained on manually annotated datasets, and publicly available datasets for polyphone disambiguation are scarce. In this paper we propose a simple back-translation-style data augmentation method for mandarin Chinese polyphone disambiguation, utilizing a large amount of unlabeled text data. Inspired by the back-translation technique proposed in the field of machine translation, we build a Grapheme-to-Phoneme (G2P) model to predict the pronunciation of polyphonic character, and a Phoneme-to-Grapheme (P2G) model to predict pronunciation into text. Meanwhile, a window-based matching strategy and a multi-model scoring strategy are proposed to judge the correctness of the pseudo-label. We design a data balance strategy to improve the accuracy of some typical polyphonic characters in the training set with imbalanced distribution or data scarcity. The experimental result shows the effectiveness of the proposed back-translation-style data augmentation method.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2211.09495

Country: Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

g2pW: A Conditional Weighted Softmax BERT for Polyphone Disambiguation in Mandarin

Chen, Yi-Chang, Chang, Yu-Chuan, Chang, Yen-Cheng, Yeh, Yi-Ren

arXiv.org Artificial IntelligenceAug-25-2022

Polyphone disambiguation is the most crucial task in Mandarin grapheme-to-phoneme (g2p) conversion. Previous studies have approached this problem using pre-trained language models, restricted output, and extra information from Part-Of-Speech (POS) tagging. Inspired by these strategies, we propose a novel approach, called g2pW, which adapts learnable softmax-weights to condition the outputs of BERT with the polyphonic character of interest and its POS tagging. Rather than using the hard mask as in previous works, our experiments show that learning a soft-weighting function for the candidate phonemes benefits performance. In addition, our proposed g2pW does not require extra pre-trained POS tagging models while using POS tags as auxiliary features since we train the POS tagging model simultaneously with the unified encoder. Experimental results show that our g2pW outperforms existing methods on the public CPP dataset. All codes, model weights, and a user-friendly package are publicly available.

disambiguation, polyphone disambiguation, polyphonic character, (13 more...)

arXiv.org Artificial Intelligence

2203.1043

Country: Asia > Taiwan > Takao Province > Kaohsiung (0.04)

Genre:

Research Report > New Finding (0.35)
Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.75)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

A Polyphone BERT for Polyphone Disambiguation in Mandarin Chinese

Zhang, Song, Zheng, Ken, Zhu, Xiaoxu, Li, Baoxiang

arXiv.org Artificial IntelligenceJul-1-2022

Grapheme-to-phoneme (G2P) conversion is an indispensable part of the Chinese Mandarin text-to-speech (TTS) system, and the core of G2P conversion is to solve the problem of polyphone disambiguation, which is to pick up the correct pronunciation for several candidates for a Chinese polyphonic character. In this paper, we propose a Chinese polyphone BERT model to predict the pronunciations of Chinese polyphonic characters. Firstly, we create 741 new Chinese monophonic characters from 354 source Chinese polyphonic characters by pronunciation. Then we get a Chinese polyphone BERT by extending a pre-trained Chinese BERT with 741 new Chinese monophonic characters and adding a corresponding embedding layer for new tokens, which is initialized by the embeddings of source Chinese polyphonic characters. In this way, we can turn the polyphone disambiguation task into a pre-training task of the Chinese polyphone BERT. Experimental results demonstrate the effectiveness of the proposed model, and the polyphone BERT model obtain 2% (from 92.1% to 94.1%) improvement of average accuracy compared with the BERT-based classifier model, which is the prior state-of-the-art in polyphone disambiguation.

bert, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2207.12089

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Polyphone Disambiguition in Mandarin Chinese with Semi-Supervised Learning

Shi, Yi, Wang, Congyi, Chen, Yu, Wang, Bin

arXiv.org Artificial IntelligenceJan-31-2021

The majority of Chinese characters are monophonic, i.e.their pronunciations are unique and thus can be induced easily using a check table. As for their counterparts, polyphonic characters have more than one pronunciation. To perform linguistic computation tasks related to spoken Mandarin Chinese, the correct pronunciation for each polyphone must be identified among several candidates according to its context. This process is called Polyphone Disambiguation, a key procedure in the Grapheme-to-phoneme (G2P) conversion step of a Chinese text-to-speech (TTS) system. The problem is well explored with both knowledge-based and learning-based approaches, yet it remains challenging due to the lack of publicly available datasets and complex language phenomenon concerned polyphone. In this paper, we propose a novel semi-supervised learning (SSL) framework for Mandarin Chinese polyphone disambiguation that can potentially leverage unlimited unlabeled text data. We explore the effect of various proxy labeling strategies including entropy-thresholding and lexicon-based labeling. As for the architecture, a pre-trained model of Electra is combined with Convolution BLSTM layers to fine-tune on our task. Qualitative and quantitative experiments demonstrate that our method achieves state-of-the-art performance in Mandarin Chinese polyphone disambiguation. In addition, we publish a novel dataset specifically for the polyphone disambiguation task to promote further researches.

polyphone disambiguation, polyphonic character, pronunciation, (10 more...)

arXiv.org Artificial Intelligence

2102.00621

Country:

North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre: Research Report (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback